Scalable Persisting and Querying of Streaming Data by Utilizing a NoSQL Data Store
نویسندگان
چکیده
Relational databases provide technology for scalable queries over persistent data. In many application scenarios a problem with conventional relational database technology is that loading large data logs produced at high rates into a database management system (DBMS) may not be fast enough, because of the high cost of indexing and converting data during loading. As an alternative a modern indexed parallel NoSQL data store, such as MongoDB, can be utilized. In this work, MongoDB was investigated for the performance of loading, indexing, and analyzing data logs of sensor readings. To investigate the trade-offs with the approach compared to relational database technology, a benchmark of log files from an industrial application was used for performance evaluation. For scalable query performance indexing is required. The evaluation covers both the loading time for the log files and the execution time of basic queries over loaded log data with and without indexes. As a comparison, we investigated the performance of using a popular open source relational DBMS and a DBMS from a major commercial vendor. The implementation, called AMI (Amos Mongo Interface), provides an interface between MongoDB and an extensible main-memory DBMS, Amos II, where different kinds of back-end storage managers and DBMSs can be interfaced. AMI enables general on-line analyzes through queries of data streams persisted in MongoDB as a back-end data store. It furthermore enables integration of NoSQL and SQL databases through queries to Amos II. The performance investigation used AMI to analyze the performance of MongoDB, while the relational DBMSs were analyzed by utilizing the existing relational DBMS interfaces of Amos II. Acknowledgements First of all, I would like to thank my grandfather, Abul Hossain-my lifetime role-model who gives me inspiration in every step of my life to overcome challenges. I am grateful to Swedish Institute for granting me Scholarship to study and research for two years in this wonderful country of Sweden. Many thanks to my aunt, Jasmin Jahan, my parents and wife. I am grateful for everything that you gave me.
منابع مشابه
NoSQL Approach to Large Scale Analysis of Persisted Streams
A potential problem for persisting large volume of streaming logs with conventional relational databases is that loading large volume of data logs produced at high rates is not fast enough due to the strong consistency model and high cost of indexing. As a possible alternative, state-of-the-art NoSQL data stores that sacrifice transactional consistency to achieve higher performance and scalabil...
متن کاملReplex: A Scalable, Highly Available Multi-Index Data Store
The need for scalable, high-performance datastores has led to the development of NoSQL databases, which achieve scalability by partitioning data over a single key. However, programmers often need to query data with other keys, which data stores provide by either querying every partition, eliminating the benefits of partitioning, or replicating additional indexes, wasting the benefits of data re...
متن کاملAn Approach of SQL to JSON Transformation For Handling Database Operations
Nowadays NOSQL databases are becoming more popular. Companies like Google, Facebook, and Amazon has created their own NOSQL databases based on their requirements. Different types of querying approaches are followed by different NOSQL databases, whereas traditional databases like MySQL, ORACLE, etc. follows SQL for querying. Most of the companies are shifting from traditional databases to NOSQL ...
متن کاملDistributed NoSQL Storage for Extreme-Scale System Services
Today with the rapidly accumulated data, datadriven applications are emerging in science and commercial areas. On both HPC systems and clouds the continuously widening performance gap between storage and computing resource prevents us from building scalable data-intensive systems. Distributed NoSQL storage systems are known for their ease of use and attractive performance and are increasingly u...
متن کاملA Review and Design of Framework for Storing and Querying RDF Data Using NoSQL Database
This paper reviews existing systems and describes a design of RDF database system that uses NoSQL database to store the data which aims to enhance performance of the Semantic Web applications. RDF data is a standard of data in the form of Subject-Predicate-Object called Triples and stored in database called Triple Store. Typically RDF database system uses SPARQL query language to query the RDF ...
متن کامل